feat: Dynamic Open Graph meta tags for social media link previews#1251
feat: Dynamic Open Graph meta tags for social media link previews#1251kevinschaper merged 7 commits intomainfrom
Conversation
Enable entity-specific link previews on Slack, Twitter, Discord, Mastodon, Bluesky, and other social platforms by serving dynamic OG tags to crawler bots. Closes #1242
✅ Deploy Preview for monarch-app canceled.
|
PR Review: Dynamic Open Graph Meta Tags for Social Media Link PreviewsSummaryThis PR implements a solid architecture for serving dynamic Open Graph tags to social media crawlers while maintaining the SPA experience for regular users. The implementation is clean and well-documented. ✅ Strengths
🔍 Code Quality & Best Practicesbackend/src/monarch_py/api/meta.py:56-58Issue: Broad exception catching loses important error context. The current code catches all exceptions and returns 404, which masks genuine errors like database connection failures. Consider catching specific exceptions or re-raising non-404 errors as 500s. backend/src/monarch_py/api/meta.py:16-20Observation: Jinja environment is initialized at module level. If TEMPLATES_DIR does not exist, the app will fail at import time rather than startup. Consider validating TEMPLATES_DIR exists during initialization. services/nginx/config/default.conf:26Minor: CURIE regex could be more restrictive. The pattern matches paths like 🔒 Security✅ XSS Protection: Jinja2 autoescape properly enabled ⚡ PerformanceGood: Jinja environment cached at module level Suggestion: Add HTTP cache headers since entity data changes infrequently: headers={"Cache-Control": "public, max-age=3600"}🧪 Test CoverageGood Coverage: Happy path, 404 handling, XSS protection Missing Test Cases:
🎯 RecommendationsHigh Priority: None - production-ready as-is ✅ Medium Priority:
Low Priority:
✅ Final VerdictApproved with minor suggestions. Well-designed, secure, and well-tested. The nginx-based routing is excellent. Suggestions are optimizations, not blockers. Great work! 🎉 |
Pull Request Review: Dynamic Open Graph Meta TagsThis is a well-architected implementation of dynamic social media previews for entity pages. The approach is clean, well-tested, and follows best practices. Here's my detailed feedback: ✅ Strengths
🔍 Code Quality Observationsbackend/src/monarch_py/api/meta.pyGood:
Minor Suggestions:
services/nginx/config/default.confConcerns:
backend/src/monarch_py/api/templates/meta.htmlConcerns:
backend/tests/api/test_meta.pyGood:
Suggestions:
⚡ Performance Considerations
🛡️ Security Considerations
📝 Minor Issues
🎯 RecommendationsMust Fix:
Should Fix: Nice to Have: 📊 Overall AssessmentCode Quality: ⭐⭐⭐⭐ (4/5) This is solid work! The main concern is the missing proxy headers in nginx, which could cause incorrect URLs in certain deployment scenarios. Once that's addressed and a few edge case tests are added, this would be ready to merge. Great job on the comprehensive testing and documentation! 🎉 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1251 +/- ##
==========================================
+ Coverage 72.62% 73.51% +0.88%
==========================================
Files 93 95 +2
Lines 3306 3417 +111
==========================================
+ Hits 2401 2512 +111
Misses 905 905 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add proxy_set_header directives to nginx bot proxy block for correct URL generation - Improve error handling: ValueError -> 404, unexpected errors -> 500 - Extract MAX_DESCRIPTION_LENGTH constant with word-boundary truncation - Add Cache-Control header (public, max-age=3600) to meta responses - Strengthen test assertions and add coverage for edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pull Request Review: Dynamic Open Graph Meta Tags for Social Media Link PreviewsOverall AssessmentThis is a well-implemented feature with excellent documentation, comprehensive test coverage, and thoughtful design. The solution elegantly solves the SPA meta tag problem by detecting bots at the nginx layer and serving dynamic content from a FastAPI endpoint. ✅ Strengths
🔍 Issues and Recommendations1. Nginx Config: Potential Performance Issue (Medium Priority)Location: services/nginx/config/default.conf:46-61 The use of 2. Description Truncation Edge Case (Low Priority)Location: meta.py:85-86 If the description has no spaces, 3. Missing Rate Limiting (Medium Priority)Location: meta.py:37 The endpoint has no rate limiting. Consider adding it at nginx level or via FastAPI middleware to prevent aggressive bot crawling. 4. CURIE Regex Pattern Limitation (Low Priority)Location: services/nginx/config/default.conf:26 The pattern 5. Missing Image Validation (Low Priority)Location: meta.py:32-34 The default OG image path 💡 Future Enhancements
🧪 Test CoverageExcellent coverage with 8 unit tests. Minor gaps:
📋 SummaryRecommendation: APPROVE with minor suggestions This is production-ready code demonstrating strong security awareness, good engineering practices, and excellent documentation. The identified issues are low-priority items that can be addressed in follow-up PRs if needed. Great work on this feature! The architecture is clean and the implementation is solid. |
ptgolden
left a comment
There was a problem hiding this comment.
This looks good. My only feedback is the potential removal of that try/catch block and the nginx tests.
In an ideal world, we would have a slightly different architecture where this would not be necessary, and going to an entity page would just serve the meta tags properly no matter what. (i.e. the entity could be prefetched and included in the response as JSON so that the SPA could pick it up without having to immediately make a request). But that would be a much larger and more disruptive change, and it's not in the cards.
@kevinschaper you mentioned removing the meta endpoint from API documentation. I don't have an opinion on it either way. I don't think it hurts to leave it in, but I understand wanting to keep it "private".
Remove unnecessary try/catch — get_entity returns None (doesn't raise), and FastAPI handles uncaught exceptions as 500s already. Matches the pattern in entity.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These don't fit the project's testing framework and add maintenance burden for shell code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This is an internal endpoint for bot crawlers, not part of the public API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: Dynamic Open Graph Meta Tags for Social Media Link PreviewsSummaryThis is a well-implemented feature that addresses a real need for better social media link previews. The architecture is clean, the code is readable, and the test coverage is excellent. The PR includes comprehensive documentation and handles edge cases thoughtfully. Strengths1. Clean ArchitectureThe nginx-based bot detection + FastAPI backend approach is elegant and performant:
2. Excellent Test CoverageThe test suite (backend/tests/api/test_meta.py) is comprehensive with 8 tests covering:
3. Security Conscious
4. Good DocumentationThe docs/link-previews.md file provides clear explanations of the architecture, testing instructions, and configuration guidance. Code Quality Issues1. Nginx Configuration - Use of if Directive (Low Priority)Location: services/nginx/config/default.conf:44-61 The implementation uses nginx if directives for conditional logic. While this works, nginx's if is notoriously tricky. The workaround of concatenating strings to create "BE" is clever and safe in this context. Consider documenting this pattern for future maintainers. 2. Description Truncation Edge Case (Very Low Priority)Location: backend/src/monarch_py/api/meta.py:74-75 If a description somehow had no spaces, rsplit(" ", 1) would return the full string. This would result in a 303-character description instead of 300. In practice, entity descriptions will always have spaces. 3. TEMPLATES_DIR Error Handling (Very Low Priority)Location: backend/src/monarch_py/api/meta.py:15-18 If TEMPLATES_DIR doesn't exist, this will fail at module import time. While fail-fast on import is acceptable, a clearer error message could help. Security Considerations1. Meta Refresh Open Redirect (Low - Currently Safe)Location: backend/src/monarch_py/api/templates/meta.html:28 The url variable is constructed from request headers and entity ID. Currently safe because nginx validates the CURIE format (lines 24-26), which prevents arbitrary URLs. However, this is a subtle dependency - if someone later modifies the nginx config to be more permissive, this could become an open redirect vulnerability. Recommendation: Add a comment in both the nginx config and the Python code noting this security dependency. 2. Rate Limiting (Low Priority - Infrastructure Concern)The endpoint doesn't have explicit rate limiting. Consider adding rate limiting at the nginx level for /v3/api/meta/* path, or document that this should be handled by infrastructure. Performance ConsiderationsCache Headers: Good choice of 1-hour cache (meta.py:88). Entity data doesn't change frequently. Solr Query Efficiency: Good use of extra=False to avoid fetching unnecessary association data (meta.py:53). TestingThe test suite is excellent and covers happy path, 404 handling, XSS protection, description truncation, missing data edge cases, and cache headers. All tests use proper mocking. Nice to Have: Consider adding tests for get_base_url() with various header combinations and very long entity names. Recommendations SummaryMust Fix: NoneShould Consider:
Nice to Have:
ApprovalThis is high-quality work that follows best practices. The implementation is clean, well-tested, and thoroughly documented. The architecture is sound and the code is maintainable. Recommendation: Approve with optional consideration of the Should Consider items above. Great work! |
Summary
Enables entity-specific link previews when sharing Monarch URLs on social media platforms (Slack, Twitter, Discord, Mastodon, Bluesky, etc.).
Closes #1242
How It Works
Architecture
Nginx Bot Detection
Nginx uses a
mapdirective to detect crawler User-Agents and routes them to the meta endpoint:Only CURIE-format paths (e.g.,
/MONDO:0005148) are eligible - regular pages like/aboutare unaffected.Meta Endpoint
The FastAPI endpoint fetches the entity from Solr and renders HTML with OG tags:
Dynamic URL Support
URLs are derived from request headers, so the same deployment works for both environments:
beta.monarchinitiative.org→ OG tags reference beta URLsmonarchinitiative.org→ OG tags reference production URLsSupported Platforms
Slackbot, Twitterbot, Facebook, LinkedIn, Discord, WhatsApp, Telegram, Mastodon, Bluesky, Googlebot, Bingbot, Embedly, Pinterest, Applebot
Files Changed
backend/src/monarch_py/api/meta.pybackend/src/monarch_py/api/templates/meta.htmlservices/nginx/config/default.confbackend/tests/api/test_meta.pydocs/link-previews.md